archived/geospatial/geospatial_pipeline_processing/geospatial_pipeline

{ "cells": [ { "cell_type": "markdown", "id": "6dbb43f2-ba99-4475-9783-cb2948b893da", "metadata": {}, "source": [ "# Build a geospatial pipeline with SageMaker Pipelines\n", "\n", "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "c4f07cc4-66cf-4367-ac7a-d93f3aa963eb", "metadata": { "tags": [] }, "source": [ "## Introduction\n", "\n", "The following notebook shows you how to build a geospatial data processing workflow with Amazon SageMaker Pipelines. SageMaker Pipelines is a purpose-built workflow orchestration service to automate all phases of machine learning from data pre-processing to model monitoring. With an intuitive UI and Python SDK you can manage repeatable end-to-end ML pipelines at scale. In this example, a workflow is created to query Sentinel-2 imagery based on a list of area of interests (AOIs) and then generate a data cube for the boundaries provided by each AOI." ] }, { "cell_type": "markdown", "id": "7285797d-ecd8-4fad-8368-5b6a86258d40", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "This notebook runs with the Geospatial 1.0 kernel with a `ml.geospatial.interactive` instance. Note that the following policies need to be attached to the execution role that you used to run this notebook:\n", "- AmazonSageMakerFullAccess\n", "- AmazonSageMakerGeospatialFullAccess\n", "\n", "You can see the policies attached to the role in the IAM console under the permissions tab. If required, add the roles using the 'Add Permissions' button.\n", "\n", "In addition to these policies, ensure that the execution role's trust policy allows the SageMaker-GeoSpatial service to assume the role. This can be done by adding the following trust policy using the 'Trust relationships' tab:\n", "\n", "```\n", "{\n", " \"Version\": \"2012-10-17\",\n", " \"Statement\": [\n", " {\n", " \"Effect\": \"Allow\",\n", " \"Principal\": {\n", " \"Service\": [\n", " \"sagemaker.amazonaws.com\",\n", " \"sagemaker-geospatial.amazonaws.com\"\n", " ]\n", " },\n", " \"Action\": \"sts:AssumeRole\"\n", " }\n", " ]\n", "}\n", "```\n", "\n", "### SageMaker Processing Service Quota\n", "\n", "In this example, `ml.m5.xlarge` instances are used to execute the Processing Jobs within the pipeline. If you're running this example in a AWS hosted event environment, the quota should be already adapted and you can skip to the next step.\n", "\n", "If you're running this example in your AWS account, you might need to request a service quota increase. You can learn more about requesting a quota increase [here](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html).\n", "\n", "To request a service quota increase for this particular example, follow the steps below:\n", "\n", "- From the AWS Console, navigate to `Service Quotas` (or follow this [link](https://us-west-2.console.aws.amazon.com/servicequotas/home?region=us-west-2))\n", "- Click on `AWS Services` in the left nav menu\n", "- Search for `Amazon SageMaker` in the `find services` box and click on the service item\n", "- On the next page, search for `ml.m5.xlarge for processing job usage`\n", "- Select the quota item and click on \"Request quota increase\" and enter an amount of 8 or higher" ] }, { "cell_type": "markdown", "id": "083a6bee-0fcd-4075-8d4e-31b6979b6de7", "metadata": {}, "source": [ "## Create a Pipeline\n", "\n", "To orchestrate your workflows with Amazon SageMaker Model Building Pipelines, you need to generate a directed acyclic graph (DAG) in the form of a JSON pipeline definition. You can generate the JSON pipeline definition using the SageMaker Python SDK. The following steps show how to generate a pipeline definition for a pipeline that uses SageMaker Processing jobs to query and preprocess Sentinel-2 data." ] }, { "cell_type": "markdown", "id": "b2aac660-dece-4b7d-905b-78a317487002", "metadata": {}, "source": [ "### Import SageMaker SDK and dependencies" ] }, { "cell_type": "code", "execution_count": null, "id": "d128e4a9-2a64-4b5c-9544-2a7bd8fc52fb", "metadata": { "tags": [] }, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker import get_execution_role\n", "from sagemaker.sklearn.processing import ScriptProcessor\n", "from sagemaker.processing import ProcessingInput, ProcessingOutput\n", "from sagemaker.workflow.steps import ProcessingStep\n", "from sagemaker.workflow.parameters import ParameterString\n", "from sagemaker.workflow.execution_variables import ExecutionVariables\n", "from sagemaker.workflow.functions import Join\n", "\n", "sagemaker_session = sagemaker.Session()\n", "execution_role = get_execution_role()\n", "\n", "geospatial_image_uri = (\n", " \"081189585635.dkr.ecr.us-west-2.amazonaws.com/sagemaker-geospatial-v1-0:latest\"\n", ")" ] }, { "cell_type": "markdown", "id": "88e26828-bed0-4ef9-923d-d0b4521f87f9", "metadata": {}, "source": [ "### Define Parameters to Parametrize Pipeline Execution\n", "\n", "You can define pipeline parameters to enable customization of pipeline executions and scheduling without needing to alter the pipeline definition itself. Parameters allow for flexible pipeline executions by setting customizable options.\n", "\n", "The types of parameters supported are:\n", "\n", "- `ParameterString`, representing the Python `str` type.\n", "- `ParameterInteger`, representing the Python `int` type.\n", "- `ParameterFloat`, representing the Python `float` type.\n", "\n", "These parameters allow for a default value to be provided, which can be overridden during pipeline execution. The default value should match the parameter type.\n", "\n", "The parameters included in this workflow are:\n", "\n", "- `start_date`: The start date for the time range of interest in ISO representation\n", "- `end_date`: The end date for the time range of interest in ISO representation\n", "- `object_boundaries_s3_location`: The S3 bucket URI containing the AOI data (object boundaries)\n", "- `object_boundaries_file_name`: The name of the file containing the AOI data (object boundaries), with \"object_boundaries.json\" as default setting\n", "- `s3_bucket_output_data`: The S3 bucket name for the output data, with the SageMaker default bucket as default setting" ] }, { "cell_type": "code", "execution_count": null, "id": "8454e644-f12e-4a86-bce0-778fb31d1f65", "metadata": { "tags": [] }, "outputs": [], "source": [ "param_start_date = ParameterString(\"start_date\")\n", "param_end_date = ParameterString(\"end_date\")\n", "param_object_boundaries_file_name = ParameterString(\n", " \"object_boundaries_file_name\", default_value=\"object_boundaries.json\"\n", ")\n", "param_object_boundaries_s3_location = ParameterString(\"object_boundaries_s3_location\")\n", "param_s3_bucket_output_data = ParameterString(\n", " \"s3_bucket_output_data\", default_value=sagemaker_session.default_bucket()\n", ")" ] }, { "cell_type": "markdown", "id": "501ab245-a9e1-4a73-b977-f5acc705df4a", "metadata": {}, "source": [ "### Define a Processing Step for querying Sentinel-2 data\n", "\n", "The following cell writes a file `fetch_aoi_meta_data.py`, which contains a script to collect Sentinel-2 meta based on the AOIs provided in the object boundary file. You can update the script, and rerun this cell to overwrite." ] }, { "cell_type": "code", "execution_count": null, "id": "39dcb2ab-8f3c-41df-a84c-e07be0edb1b9", "metadata": {}, "outputs": [], "source": [ "%%writefile fetch_aoi_meta_data.py\n", "\n", "import os\n", "import pickle\n", "import sys\n", "import subprocess\n", "import json\n", "import time\n", "import geopandas\n", "import shapely\n", "import shapely.geometry\n", "from shapely.ops import unary_union\n", "import logging\n", "from datetime import datetime, timedelta\n", "import boto3\n", "\n", "\n", "def get_logger(log_level):\n", " logger = logging.getLogger(\"processing\")\n", "\n", " console_handler = logging.StreamHandler(sys.stdout)\n", " console_handler.setFormatter(logging.Formatter(\"%(asctime)s [%(levelname)s] %(message)s\"))\n", " console_handler.setLevel(log_level)\n", "\n", " logger.addHandler(console_handler)\n", " logger.setLevel(log_level)\n", " return logger\n", "\n", "\n", "def parse_date(date_str):\n", " # string starts with '-', assumes delta in days (e.g. -1 for yesterday)\n", " if date_str.startswith(\"-\"):\n", " days_delta = int(date_str) * -1\n", " target_date = datetime.today() - timedelta(days=days_delta)\n", " date_str = target_date.strftime(\"%Y-%m-%d\")\n", " # convert to datetime to validate format\n", " date_time_obj = datetime.strptime(date_str, \"%Y-%m-%d\")\n", " return date_str\n", "\n", "\n", "def get_date_range(args):\n", " start_date = parse_date(args[2].strip())\n", " end_date = parse_date(args[3].strip())\n", " return start_date, end_date\n", "\n", "\n", "def get_s2_items(objects_gdf, start_date, end_date):\n", " session = boto3.Session()\n", " geospatial_client = session.client(service_name=\"sagemaker-geospatial\", region_name=\"us-west-2\")\n", "\n", " s2_items = {}\n", "\n", " for i, row in objects_gdf.iterrows():\n", " aoi_geometry = row[\"geometry\"]\n", " if type(row[\"geometry\"]) == shapely.geometry.multipolygon.MultiPolygon:\n", " aoi_geometry = unary_union(row[\"geometry\"])\n", "\n", " bbox = aoi_geometry.bounds\n", " aoi_bbox = shapely.geometry.box(*bbox, ccw=True)\n", "\n", " search_params = {\n", " \"Arn\": \"arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8\", # Sentinel-2 L2A data\n", " \"RasterDataCollectionQuery\": {\n", " \"AreaOfInterest\": {\n", " \"AreaOfInterestGeometry\": {\n", " \"PolygonGeometry\": {\n", " \"Coordinates\": shapely.geometry.mapping(aoi_bbox)[\"coordinates\"]\n", " }\n", " }\n", " },\n", " \"TimeRangeFilter\": {\n", " \"StartTime\": f\"{start_date}T00:00:00Z\",\n", " \"EndTime\": f\"{end_date}T23:59:59Z\",\n", " },\n", " \"PropertyFilters\": {\n", " \"Properties\": [\n", " {\"Property\": {\"EoCloudCover\": {\"LowerBound\": 0.0, \"UpperBound\": 50.0}}}\n", " ],\n", " \"LogicalOperator\": \"AND\",\n", " },\n", " },\n", " }\n", "\n", " next_token = True\n", " item_count = 0\n", " while next_token:\n", " search_result = geospatial_client.search_raster_data_collection(**search_params)\n", " for item in search_result[\"Items\"]:\n", " if item[\"Id\"] not in s2_items:\n", " s2_items[item[\"Id\"]] = {\"objects\": [], \"data\": item}\n", " s2_items[item[\"Id\"]][\"objects\"].append(row[\"objectid\"])\n", " item_count += len(search_result[\"Items\"])\n", " next_token = search_result.get(\"NextToken\")\n", " search_params[\"NextToken\"] = next_token\n", "\n", " logger.debug(\"Found {} items for object {}\".format(item_count, row[\"objectid\"]))\n", "\n", " return s2_items\n", "\n", "\n", "if __name__ == \"__main__\":\n", " logger = get_logger(logging.INFO)\n", "\n", " logger.info(\"Starting processing\")\n", " logger.debug(f\"Argument List: {str(sys.argv)}\")\n", "\n", " object_boundaries_file_name = sys.argv[1]\n", " start_date, end_date = get_date_range(sys.argv)\n", "\n", " logger.info(f\"Executing for date range: [{start_date}, {end_date}]\")\n", "\n", " sys.stdout.flush()\n", "\n", " objects_gdf = geopandas.read_file(\n", " f\"/opt/ml/processing/input/objects/{object_boundaries_file_name}\"\n", " )\n", " s2_items = get_s2_items(objects_gdf, start_date, end_date)\n", "\n", " logger.info(\n", " \"Found {} Sentinel-2 tiles from {} to {}\".format(len(s2_items), start_date, end_date)\n", " )\n", "\n", " output_path = \"/opt/ml/processing/output\"\n", "\n", " for scene_id, item in s2_items.items():\n", " if \"_0_L2A\" not in scene_id:\n", " continue\n", " output_file_path = f\"{output_path}/{scene_id}.json\"\n", " item[\"objects\"] = list(set(item[\"objects\"]))\n", " with open(output_file_path, \"w\", encoding=\"utf8\") as f:\n", " json.dump(item, f, default=str)\n", " logger.debug(f\"Written output: {output_file_path}\")\n", "\n", " logger.info(\"Written all outputs\")" ] }, { "cell_type": "markdown", "id": "b704b1ab-f455-4a36-a077-1a055cf9dd29", "metadata": {}, "source": [ "This Processing step executes the script on the file containing the AOIs and the date range provided as pipeline parameters. The output will be written to S3 and passed to the next Processing step as an input." ] }, { "cell_type": "code", "execution_count": null, "id": "f6184a2d-ef81-4c45-a6d8-2f5482bd53cd", "metadata": { "tags": [] }, "outputs": [], "source": [ "processor = ScriptProcessor(\n", " command=[\"python3\"],\n", " image_uri=geospatial_image_uri,\n", " role=execution_role,\n", " instance_count=1,\n", " instance_type=\"ml.m5.xlarge\",\n", ")\n", "\n", "step_process_fetch_data = ProcessingStep(\n", " name=\"data-fetch\",\n", " processor=processor,\n", " code=\"fetch_aoi_meta_data.py\",\n", " inputs=[\n", " ProcessingInput(\n", " source=param_object_boundaries_s3_location,\n", " destination=\"/opt/ml/processing/input/objects/\",\n", " s3_data_distribution_type=\"FullyReplicated\",\n", " ),\n", " ],\n", " outputs=[\n", " ProcessingOutput(\n", " output_name=\"scene_metadata\",\n", " source=\"/opt/ml/processing/output/\",\n", " destination=Join(\n", " on=\"/\",\n", " values=[\n", " \"s3:/\",\n", " param_s3_bucket_output_data,\n", " \"processing-geospatial-pipeline-example\",\n", " ExecutionVariables.PIPELINE_EXECUTION_ID,\n", " \"output/scene_metadata\",\n", " ],\n", " ),\n", " )\n", " ],\n", " job_arguments=[param_object_boundaries_file_name, param_start_date, param_end_date],\n", ")" ] }, { "cell_type": "markdown", "id": "3f5da9ba-d12c-41cd-8352-5234a112ea79", "metadata": {}, "source": [ "### Define Processing Step to generate data cubes\n", "\n", "For the second Processing step, the next cell writes the file `generate_data_cube_clipped.py`. This script will use the inputs from the first Processing step and generate a data cube, including RGB and near infrared (NIR) bands. In addition, the Normalized difference vegetation index (NDVI) will be computed and added to the data cube. For each AOI, a dedicated data cube is created, clipped to the boundaries for the corresponding object." ] }, { "cell_type": "code", "execution_count": null, "id": "38a7d8bb-9391-46c2-8df6-af7658dbfb72", "metadata": {}, "outputs": [], "source": [ "%%writefile generate_data_cube_clipped.py\n", "\n", "import os\n", "import pickle\n", "import sys\n", "import subprocess\n", "import warnings\n", "import json\n", "import time\n", "import geopandas\n", "import pandas as pd\n", "import numpy as np\n", "import shapely\n", "from shapely.geometry import shape\n", "import xarray as xr\n", "import rioxarray\n", "from rioxarray.exceptions import NoDataInBounds\n", "import gc\n", "import logging\n", "import datetime\n", "\n", "MAX_CLOUD_COVER_AOI = 0.3\n", "\n", "# Cloud mask values:\n", "# 8 - Cloud medium probability\n", "# 9 - Cloud high probability\n", "# 10 - Thin cirrus\n", "# For details see here: https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm-overview\n", "SCL_CLOUD_MASK_CLASSES = [8, 9, 10]\n", "SCL_MASK_CLOUD_FREE_CLASSES = [x for x in list(range(0, 12)) if x not in SCL_CLOUD_MASK_CLASSES]\n", "\n", "\n", "def get_logger(log_level):\n", " logger = logging.getLogger(\"processing\")\n", "\n", " console_handler = logging.StreamHandler(sys.stdout)\n", " console_handler.setFormatter(logging.Formatter(\"%(asctime)s [%(levelname)s] %(message)s\"))\n", " console_handler.setLevel(log_level)\n", "\n", " logger.addHandler(console_handler)\n", " logger.setLevel(log_level)\n", " return logger\n", "\n", "\n", "def s2_scene_id_to_cog_path(scene_id):\n", " parts = scene_id.split(\"_\")\n", " s2_qualifier = \"{}/{}/{}/{}/{}/{}\".format(\n", " parts[1][0:2],\n", " parts[1][2],\n", " parts[1][3:5],\n", " parts[2][0:4],\n", " str(int(parts[2][4:6])),\n", " \"_\".join(parts),\n", " )\n", " return f\"https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/{s2_qualifier}/\"\n", "\n", "\n", "def scene_id_to_datetime(scene_id):\n", " dt = pd.to_datetime(scene_id.split(\"_\")[-3])\n", " return dt\n", "\n", "\n", "def get_aoi_cloud_free_ratio(scl_raster, aoi):\n", " kwargs = {\"nodata\": np.nan}\n", " scl_raster = scl_raster.rio.reproject(\"EPSG:4326\", **kwargs)\n", " # clip to AOI\n", " scl_raster_clipped = scl_raster.rio.clip(geometries=aoi)\n", " # get cloud-free ratio\n", " scl_mask_pixel_count = scl_raster_clipped.SCL.data.size - np.count_nonzero(\n", " np.isnan(scl_raster_clipped.SCL.data)\n", " ) # get size of SCL mask in num pixels (excl. any nans)\n", " scl_cloud_free_pixel_count = np.isin(\n", " scl_raster_clipped.SCL.data, SCL_MASK_CLOUD_FREE_CLASSES\n", " ).sum() # count pixels that are non-cloud class\n", " cloud_free_ratio = scl_cloud_free_pixel_count / scl_mask_pixel_count\n", "\n", " return cloud_free_ratio\n", "\n", "\n", "if __name__ == \"__main__\":\n", " logger = get_logger(logging.INFO)\n", "\n", " logger.info(\"Starting processing\")\n", " logger.debug(f\"Argument List: {str(sys.argv)}\")\n", "\n", " sys.stdout.flush()\n", "\n", " object_boundaries_file_name = sys.argv[1]\n", " objects_gdf = geopandas.read_file(\n", " f\"/opt/ml/processing/input/objects/{object_boundaries_file_name}\"\n", " )\n", "\n", " scene_meta_data_path = \"/opt/ml/processing/input/scene_meta_data/\"\n", " scene_meta_items = []\n", " for current_path, sub_dirs, files in os.walk(scene_meta_data_path):\n", " for file in files:\n", " if file.endswith(\".json\"):\n", " full_file_path = os.path.join(scene_meta_data_path, current_path, file)\n", " with open(full_file_path, \"r\") as f:\n", " scene_meta_items.append(json.load(f))\n", "\n", " item_count_total = len(scene_meta_items)\n", " item_count_current = 0\n", " elapsed_time_batch = 0\n", " logger.info(\"Received {} scenes to process\".format(item_count_total))\n", "\n", " for scene_meta_item in scene_meta_items:\n", " if item_count_current > 0 and item_count_current % 5 == 0:\n", " logger.info(\n", " \"Processed {}/{} scenes ({}s per scene)\".format(\n", " item_count_current,\n", " item_count_total,\n", " round(elapsed_time_batch / item_count_current, 2),\n", " )\n", " )\n", " item_count_current += 1\n", "\n", " item = scene_meta_item[\"data\"]\n", " logger.debug(\"Processing scene: {}\".format(item[\"Id\"]))\n", "\n", " start = time.time()\n", "\n", " s2_scene_id = item[\"Id\"]\n", " s2_cog_prefix = s2_scene_id_to_cog_path(s2_scene_id)\n", " date = scene_id_to_datetime(s2_scene_id)\n", "\n", " band_ids = [\n", " \"B02\",\n", " \"B03\",\n", " \"B04\",\n", " \"B08\",\n", " ]\n", "\n", " bands = []\n", " for band_id in band_ids:\n", " band_data = rioxarray.open_rasterio(\n", " f\"{s2_cog_prefix}/{band_id}.tif\", masked=True, band_as_variable=True\n", " )\n", " band_data = band_data.rename(name_dict={\"band_1\": band_id})\n", " bands.append(band_data)\n", "\n", " scl_mask = rioxarray.open_rasterio(\n", " f\"{s2_cog_prefix}/SCL.tif\", masked=True, band_as_variable=True\n", " )\n", " scl_mask = scl_mask.rename(name_dict={\"band_1\": \"SCL\"})\n", " with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " scl_mask = scl_mask.interp(x=bands[0][\"x\"], y=bands[0][\"y\"])\n", "\n", " bands.append(scl_mask)\n", " s2_cube = xr.merge(objects=bands)\n", " del bands\n", " gc.collect()\n", "\n", " # assign time dimension\n", " s2_cube = s2_cube.assign_coords(time=date) # call this 'time'\n", " # reproject to EPSG:4326\n", " kwargs = {\"nodata\": np.nan}\n", " s2_cube = s2_cube.rio.reproject(\"EPSG:4326\", **kwargs)\n", "\n", " # loop over objects that intersect with s2 item\n", " intersect_gdf = objects_gdf.loc[objects_gdf[\"objectid\"].isin(scene_meta_item[\"objects\"])]\n", " for index, row in intersect_gdf.iterrows():\n", " object_id = row[\"objectid\"]\n", "\n", " # check cloud-free ratio\n", " geometries = [row[\"geometry\"]]\n", " cloud_free_ratio = get_aoi_cloud_free_ratio(scl_mask, geometries)\n", " if (1 - float(cloud_free_ratio)) > MAX_CLOUD_COVER_AOI:\n", " logger.debug(\n", " f\"AOI cloud cover ratio too high ({round(1-cloud_free_ratio,3)}), skipping object {object_id} in scene {s2_scene_id}...\"\n", " )\n", " del cloud_free_ratio\n", " else:\n", " logger.debug(\n", " f\"AOI cloud cover ratio below threshold ({round(1-cloud_free_ratio,3)}), processing object {object_id} in scene {s2_scene_id}...\"\n", " )\n", " try:\n", " clipped = s2_cube.rio.clip(geometries=geometries)\n", " except NoDataInBounds as e:\n", " logger.warn(\n", " \"Skipping {} in {}: no data in bounds\".format(object_id, s2_scene_id)\n", " )\n", " continue\n", "\n", " clipped_cloud_free = clipped.where(clipped.SCL.isin(SCL_MASK_CLOUD_FREE_CLASSES))\n", " # calculate index and add back to the original data cube\n", " clipped[\"NDVI\"] = (clipped_cloud_free.B08 - clipped_cloud_free.B04) / (\n", " clipped_cloud_free.B08 + clipped_cloud_free.B04\n", " )\n", "\n", " file_name = f\"{object_id}-{s2_scene_id}.nc\"\n", " output_file_path = f\"/opt/ml/processing/output/{file_name}\"\n", "\n", " clipped.to_netcdf(output_file_path)\n", "\n", " logger.debug(f\"Written output: {output_file_path}\")\n", "\n", " del clipped\n", " del geometries\n", " del cloud_free_ratio\n", " gc.collect()\n", "\n", " # explicit dereference to keep memory usage low\n", " del s2_cube\n", " del scl_mask\n", " sys.stdout.flush()\n", " gc.collect()\n", "\n", " elapsed_time = time.time() - start\n", " elapsed_time_batch += elapsed_time\n", "\n", " logger.debug(\"Processed scene {}: {}s\".format(s2_scene_id, elapsed_time))" ] }, { "cell_type": "markdown", "id": "78086682-9485-425e-812d-a6e7587272e8", "metadata": {}, "source": [ "The second Processing step will depend on the first step and its input. The second step will perform the computations and write the preprocessed data to S3." ] }, { "cell_type": "code", "execution_count": null, "id": "b57a8faf-c080-4336-b348-be5b9410f028", "metadata": { "tags": [] }, "outputs": [], "source": [ "processor = ScriptProcessor(\n", " command=[\"python3\"],\n", " image_uri=geospatial_image_uri,\n", " role=execution_role,\n", " instance_count=8,\n", " instance_type=\"ml.m5.xlarge\",\n", ")\n", "\n", "step_process_gen_data_cube = ProcessingStep(\n", " name=\"generate-data-cube-clip\",\n", " processor=processor,\n", " code=\"generate_data_cube_clipped.py\",\n", " inputs=[\n", " ProcessingInput(\n", " source=param_object_boundaries_s3_location,\n", " destination=\"/opt/ml/processing/input/objects/\",\n", " s3_data_distribution_type=\"FullyReplicated\",\n", " ),\n", " ProcessingInput(\n", " source=Join(\n", " on=\"/\",\n", " values=[\n", " \"s3:/\",\n", " param_s3_bucket_output_data,\n", " \"processing-geospatial-pipeline-example\",\n", " ExecutionVariables.PIPELINE_EXECUTION_ID,\n", " \"output/scene_metadata\",\n", " ],\n", " ),\n", " destination=\"/opt/ml/processing/input/scene_meta_data/\",\n", " s3_data_distribution_type=\"ShardedByS3Key\",\n", " ),\n", " ],\n", " outputs=[\n", " ProcessingOutput(\n", " output_name=\"data_cube_clipped\",\n", " source=\"/opt/ml/processing/output/\",\n", " destination=Join(\n", " on=\"/\",\n", " values=[\n", " \"s3:/\",\n", " param_s3_bucket_output_data,\n", " \"processing-geospatial-pipeline-example\",\n", " ExecutionVariables.PIPELINE_EXECUTION_ID,\n", " \"output/processed\",\n", " ],\n", " ),\n", " )\n", " ],\n", " job_arguments=[param_object_boundaries_file_name],\n", " depends_on=[step_process_fetch_data],\n", ")" ] }, { "cell_type": "markdown", "id": "ae3ca072-50ab-493d-90e0-947598402670", "metadata": {}, "source": [ "### Define the Pipeline layout\n", "\n", "In this section, the previously created steps and parameters will be combined into a Pipeline so it can be executed.\n", "\n", "A pipeline requires a `name`, `parameters`, and `steps`. The name of a pipeline must be unique within an `(account, region)` pair." ] }, { "cell_type": "code", "execution_count": null, "id": "96037d51-46ac-45c2-b032-9b12c9565c38", "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker.workflow.pipeline import Pipeline\n", "\n", "pipeline_steps = [step_process_fetch_data, step_process_gen_data_cube]\n", "\n", "pipeline_parameters = [\n", " param_start_date,\n", " param_end_date,\n", " param_object_boundaries_file_name,\n", " param_object_boundaries_s3_location,\n", " param_s3_bucket_output_data,\n", "]\n", "\n", "pipeline = Pipeline(\n", " name=\"processing-geospatial-pipeline\",\n", " parameters=pipeline_parameters,\n", " steps=pipeline_steps,\n", ")" ] }, { "cell_type": "markdown", "id": "5a06cbb7-0ce8-400b-8638-9b7ed1896dec", "metadata": {}, "source": [ "### (Optional) Examining the pipeline definition\n", "\n", "The JSON of the pipeline definition can be examined to confirm the pipeline is well-defined and the parameters and step properties resolve correctly." ] }, { "cell_type": "code", "execution_count": null, "id": "594655ff-7392-419d-bb08-24135c22aebd", "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "\n", "definition = json.loads(pipeline.definition())\n", "definition" ] }, { "cell_type": "markdown", "id": "e5f3b2e7-d970-4a53-9c47-95b194eecb9b", "metadata": {}, "source": [ "### Upsert Pipeline to persist definition\n", "\n", "Submit the pipeline definition to the Pipeline service. The Pipeline service uses the role that is passed in to create all the jobs defined in the steps." ] }, { "cell_type": "code", "execution_count": null, "id": "432d3940-b805-4091-aaaf-a8e102fb19d8", "metadata": { "tags": [] }, "outputs": [], "source": [ "pipeline.upsert(role_arn=execution_role)" ] }, { "cell_type": "markdown", "id": "b96d7b8a-3b0b-4690-a6f4-f4af61a4d830", "metadata": {}, "source": [ "After the pipeline has been created, you are able to inspect the created pipeline.\n", "\n", "For this, you can navigate to the SageMaker Studio Resources tab in the left menu and click on `Pipelines`.\n", "\n", "You should be able to see the \"processing-geospatial-pipeline\" in the list. You can click on it and then navigate to the `Graph` tab to see a visual representation of the created pipeline.\n", "\n", "![SageMaker Pipeline](images/processing-geospatial-pipeline.png)" ] }, { "cell_type": "markdown", "id": "c30c7252-c1cf-49dd-9cff-c6e19acfeaeb", "metadata": {}, "source": [ "## Execute the Pipeline" ] }, { "cell_type": "markdown", "id": "136f6d9a-0949-41e7-bbb3-c37d805d950c", "metadata": {}, "source": [ "After the pipeline has been created, a pipeline execution can be triggered either via the Pipeline UI, or by using either the Python SDK or the service API via boto3.\n", "\n", "Before an execution is submitted, we'll upload an example set of AOI data to S3 which serves as an input for the created pipeline." ] }, { "cell_type": "code", "execution_count": null, "id": "467c70d2-32a5-40a8-90ae-13c858c1bda1", "metadata": { "tags": [] }, "outputs": [], "source": [ "import geopandas\n", "\n", "gdf = geopandas.read_file(\"data/object_boundaries.json\")\n", "gdf" ] }, { "cell_type": "code", "execution_count": null, "id": "a77f9352-a904-4974-91a6-6e80d737e31c", "metadata": { "tags": [] }, "outputs": [], "source": [ "import boto3\n", "import json\n", "\n", "bucket_name = sagemaker_session.default_bucket()\n", "file_name_object_boundaries = \"object_boundaries.json\"\n", "bucket_prefix_input_object_boundaries = f\"processing-geospatial-pipeline-example/input/aoi\"\n", "\n", "# upload crop field boundaries to S3\n", "s3 = boto3.resource(\"s3\")\n", "s3object = s3.Object(\n", " bucket_name, f\"{bucket_prefix_input_object_boundaries}/{file_name_object_boundaries}\"\n", ")\n", "response = s3object.put(Body=open(f\"data/{file_name_object_boundaries}\", \"rb\"))" ] }, { "cell_type": "markdown", "id": "2ef495b0-7ec6-4c56-bd7a-4d5c06c81273", "metadata": {}, "source": [ "### (Optional) Visualize the AOI input data\n", "\n", "The following cells will create an interactive map with the Amazon SageMaker geospatial Map SDK. The input data in the geopandas dataframe will be visualized in the embedded map." ] }, { "cell_type": "code", "execution_count": null, "id": "2ecf10d3-f77a-4152-9ab7-9177164da61e", "metadata": { "tags": [] }, "outputs": [], "source": [ "import boto3\n", "import sagemaker_geospatial_map\n", "\n", "session = boto3.Session()\n", "geospatial_client = session.client(service_name=\"sagemaker-geospatial\")\n", "\n", "Map = sagemaker_geospatial_map.create_map({\"is_raster\": True})\n", "Map.set_sagemaker_geospatial_client(geospatial_client)" ] }, { "cell_type": "code", "execution_count": null, "id": "94116758-3f62-4120-8fe8-d61f18e56b58", "metadata": { "tags": [] }, "outputs": [], "source": [ "Map.render()" ] }, { "cell_type": "code", "execution_count": null, "id": "8702b8a4-4dab-4660-94c5-a2ec579a0236", "metadata": {}, "outputs": [], "source": [ "dataset = Map.add_dataset(\n", " {\"data\": gdf, \"label\": \"Object boundaries (AOIs)\"}, auto_create_layers=True\n", ")" ] }, { "cell_type": "markdown", "id": "3798ccb0-df39-4727-86c9-0573c606ddff", "metadata": {}, "source": [ "### Start a pipeline execution\n", "\n", "After the input data has been uploaded to S3, an execution of the Pipeline can be triggered by providing the mandatory Pipeline parameters and invoke the `pipeline.start` function." ] }, { "cell_type": "code", "execution_count": null, "id": "bc4ad247-24ea-4ae2-a89f-2772762684cd", "metadata": { "tags": [] }, "outputs": [], "source": [ "pipeline_execution_parameters = {\n", " \"start_date\": \"2017-07-01\",\n", " \"end_date\": \"2018-10-01\",\n", " \"object_boundaries_s3_location\": f\"s3://{bucket_name}/{bucket_prefix_input_object_boundaries}/\",\n", "}\n", "\n", "execution = pipeline.start(parameters=pipeline_execution_parameters)" ] }, { "cell_type": "markdown", "id": "926cc7db-26f5-43e7-a2fd-2f85a06b184c", "metadata": {}, "source": [ "After the pipeline execution has been started, you can see it as well in the Pipelines UI.\n", "\n", "Navigate back to the \"processing-geospatial-pipeline\" in the Pipelines UI and you can see the execution in the `Executions` tab. You can double-click it to see the details of this execution.\n", "\n", "![Pipeline Execution](images/pipeline-execution.png)" ] }, { "cell_type": "markdown", "id": "469e8167-282e-46c7-a15f-a058da720fb1", "metadata": {}, "source": [ "### (Optional) Alternative way to execute the created Pipeline\n", "\n", "Apart from the Pipeline Python SDK, you can also execute a Pipeline by using the boto3 library. Comment out the cell below to trigger a Pipeline execution via boto3." ] }, { "cell_type": "code", "execution_count": null, "id": "3b6c0394-9205-4246-9fdc-60ce529fb42a", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "import boto3\n", "import time\n", "\n", "session = boto3.Session()\n", "sagemaker_client = session.client(service_name=\"sagemaker\")\n", "\n", "pipeline_execution_parameters = {\n", " \"start_date\": \"2017-07-01\",\n", " \"end_date\": \"2018-10-01\",\n", " \"object_boundaries_s3_location\": f\"s3://{bucket_name}/{bucket_prefix_input_object_boundaries}/\",\n", "}\n", "\n", "# transform the parameter dictionary into a list format\n", "pipeline_execution_parameters_list = []\n", "for key, value in pipeline_execution_parameters.items():\n", " pipeline_execution_parameters_list.append({\"Name\": key, \"Value\": value})\n", "\n", "response = sagemaker_client.start_pipeline_execution(\n", " PipelineName=\"processing-geospatial-pipeline\",\n", " PipelineExecutionDisplayName=f\"execution-{int(time.time())}\",\n", " PipelineParameters=pipeline_execution_parameters_list,\n", ")\n", "\"\"\"" ] }, { "cell_type": "markdown", "id": "00080f1e-5721-484f-a4a8-aaa0dc24fcc4", "metadata": { "tags": [] }, "source": [ "### Pipeline Operations: Examining and Waiting for Pipeline Execution\n", "\n", "After the execution has been started, you can examine the state of execution with the following command." ] }, { "cell_type": "code", "execution_count": null, "id": "9b004c2a-d04f-4e9f-a5da-74c05b81ec09", "metadata": {}, "outputs": [], "source": [ "execution.describe()" ] }, { "cell_type": "markdown", "id": "e1fc43c6-0445-4209-8805-fd056432c6af", "metadata": {}, "source": [ "By default, the Pipeline execution is not blocking the notebook execution. If you want to wait and block the notebook execution until the Pipeline execution is finished, you can use the `execution.wait()` function. Execute the next cell to be able to follow the further instructions in this notebook." ] }, { "cell_type": "code", "execution_count": null, "id": "9f74a1a3-9704-4aec-a467-2e77b25b8ffa", "metadata": { "tags": [] }, "outputs": [], "source": [ "execution.wait()" ] }, { "cell_type": "markdown", "id": "198459cc-b782-4cf2-9fce-267106d94801", "metadata": {}, "source": [ "List the steps in the execution. These are the steps in the pipeline that have been resolved by the step executor service." ] }, { "cell_type": "code", "execution_count": null, "id": "52fe0bec-24b3-437d-8669-d7013d74025a", "metadata": { "tags": [] }, "outputs": [], "source": [ "execution.list_steps()" ] }, { "cell_type": "markdown", "id": "899afb5d-007f-4f8f-a99f-77a8ef2d304b", "metadata": {}, "source": [ "### Inspecting Pipeline execution results\n", "\n", "After the Pipeline execution has been finished, we can access the meta data of the underlying Processing jobs and identify the S3 output path of the job." ] }, { "cell_type": "code", "execution_count": null, "id": "4060a983-4a7b-4ed2-91b9-4381d5a467bd", "metadata": { "tags": [] }, "outputs": [], "source": [ "execution.list_steps()[1][\"Metadata\"][\"ProcessingJob\"][\"Arn\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "2438f503-f1b2-4017-b513-ad5a66e18dfb", "metadata": { "tags": [] }, "outputs": [], "source": [ "import boto3\n", "\n", "session = boto3.Session()\n", "sagemaker_client = session.client(service_name=\"sagemaker\")\n", "\n", "output_job_name = execution.list_steps()[0][\"Metadata\"][\"ProcessingJob\"][\"Arn\"].split(\"/\")[-1]\n", "job_descriptor = sagemaker_client.describe_processing_job(ProcessingJobName=output_job_name)\n", "\n", "s3_output_uri = job_descriptor[\"ProcessingOutputConfig\"][\"Outputs\"][0][\"S3Output\"][\"S3Uri\"]\n", "s3_output_uri" ] }, { "cell_type": "markdown", "id": "3abc8049-b441-4c13-9c8a-4ad0e9c56ac6", "metadata": {}, "source": [ "We'll load the generated data cubes from S3 into memory for a specific AOI (the AOI of object 1), and will inspect the generated data cube." ] }, { "cell_type": "code", "execution_count": null, "id": "2e6c223a-3c0f-4dbf-a075-4cb786a7de13", "metadata": { "tags": [] }, "outputs": [], "source": [ "output_bucket_name = s3_output_uri.split(\"/\")[2]\n", "output_bucket_prefix = \"/\".join(s3_output_uri.split(\"/\")[3:])\n", "\n", "s3_bucket = session.resource(\"s3\").Bucket(output_bucket_name)\n", "\n", "raster_keys = []\n", "for s3_object in s3_bucket.objects.filter(Prefix=output_bucket_prefix).all():\n", " filename = str(s3_object.key).split(\"/\")[-1]\n", " if filename.startswith(\"1-\"):\n", " raster_keys.append(str(s3_object.key))" ] }, { "cell_type": "code", "execution_count": null, "id": "7105973b", "metadata": {}, "outputs": [], "source": [ "!pip install -q --root-user-action=ignore s3fs" ] }, { "cell_type": "code", "execution_count": null, "id": "b77e6450-c8a6-40d1-b32e-ac225de8b881", "metadata": { "tags": [] }, "outputs": [], "source": [ "import xarray as xr\n", "import s3fs\n", "\n", "fs = s3fs.S3FileSystem(anon=False)\n", "\n", "rasters = []\n", "for raster_key in raster_keys:\n", " s3_path = f\"s3://{output_bucket_name}/{raster_key}\"\n", " with fs.open(s3_path) as file_obj:\n", " try:\n", " with xr.open_dataset(file_obj, engine=\"h5netcdf\", decode_coords=\"all\") as raster:\n", " rasters.append(raster.load())\n", " except ValueError:\n", " print(\"Error while loading {}. Skipping.\".format(raster_key))\n", "\n", "data_cube = xr.concat(objs=rasters, coords=\"minimal\", dim=\"time\", join=\"outer\")\n", "data_cube = data_cube.sortby(\"time\")" ] }, { "cell_type": "code", "execution_count": null, "id": "26bc0318-4337-42a7-8ad8-6b575b630bbf", "metadata": {}, "outputs": [], "source": [ "data_cube" ] }, { "cell_type": "code", "execution_count": null, "id": "e1e28f0c-0931-4b6b-b52e-287223cfb921", "metadata": { "tags": [] }, "outputs": [], "source": [ "data_cube.NDVI.sel(time=data_cube.time.min()).plot(cmap=\"RdYlGn\", vmin=0, vmax=1)" ] }, { "cell_type": "code", "execution_count": null, "id": "ed85e1f1-122f-43b4-9a45-caf72da09452", "metadata": { "tags": [] }, "outputs": [], "source": [ "data_cube.NDVI.sel(time=data_cube.time.max()).plot(cmap=\"RdYlGn\", vmin=0, vmax=1)" ] }, { "cell_type": "code", "execution_count": null, "id": "6c9b398d-3cb6-43f7-bc89-3d520d58c9cc", "metadata": { "tags": [] }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "# plot ndvi timeseries for midpoint of field\n", "midpoint_x = data_cube[\"x\"][round((len(data_cube[\"x\"])) / 2)]\n", "midpoint_y = data_cube[\"y\"][round((len(data_cube[\"y\"])) / 2)]\n", "\n", "plt.plot(data_cube.time, data_cube.NDVI.sel(x=midpoint_x, y=midpoint_y), \"-o\")" ] }, { "cell_type": "markdown", "id": "663d50d5-a2c2-40e8-855d-74ac163278bb", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/sagemaker-geospatial|geospatial-processing-pipeline|geospatial_pipeline_processing.ipynb)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b13630dc-5b50-47c0-bd31-eff49869ff86", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 57, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.trn1.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 58, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.trn1.32xlarge", "vcpuNum": 128 }, { "_defaultOrder": 59, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.trn1n.32xlarge", "vcpuNum": 128 } ], "instance_type": "ml.geospatial.interactive", "kernelspec": { "display_name": "Python 3 (Geospatial 1.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:081189585635:image/sagemaker-geospatial-v1-0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" } }, "nbformat": 4, "nbformat_minor": 5 }

archived/geospatial/geospatial_pipeline_processing/geospatial_pipeline_processing.ipynb (1,874 lines of code) (raw):